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March  9,  1990 

Abstract  This  paper  concerns  differential  equations  which  contain  strong 
mixing  random  processes.  The  solution  process  is  shown  to  be  well  approx¬ 
imated  by  a  deterministic  trajectory,  over  an  infinite  time  interval,  using 
the  interplay  between  the  rate  of  fluctuations  of  the  random  process  and 
the  rate  of  the  tp  mixing.  Ari  application  of  the  result  is  given  for  analysing 
synaptic  modifications  in  Neural  Networks. 

I.  Introduction 

The  mathematical  theory  of  stochastic  differential  equations  is  concerned  mainly  with 
the  study  of  ltd  equations  and  the  associated  Markov  process.  Mostly,  the  results  on  non 
ltd  type  equations  have  been  concerned  with  the  conditions  under  which  x((t)  converges  (as 
e  — *  0)  to  a  diffusion  process  on  finite  intervals  [0,T/c]  (cf.  Stratonovich,  1963;  Cogburn 
and  Hersh,  1973;  Papanicolaou  and  Kohler,  1974;  Blankenship  and  Papanicolaou  1977). 
Averaging  results  for  random  differential  equations  are  usually  discussed  in  conjunction  with 
the  law  of  large  numbers  Kohler  and  Papanicolaou  (1976)  with  the  central  limit  theorem 
for  (ze(t)  -  ye(t))/V^  on  [0,T]  (cf.  Khasminskii,  1966;  and  White  1976).  Genian  (1979) 
showed  that  the  solution  process  of  a  random  differential  equation  which  contains  strong 
mixing  random  process  is  well  approximated  by  a  deterministic  trajectory  over  a  finite 
time  interval,  and  for  a  more  restricted  systems,  over  the  infinite  time  interval.  Analysis 
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analogous  to  that  was  carried  out  on  Ito  type  equations  by  Vrkoc  (1966),  and  by  Ly brand 
(1975). 

In  this  paper  we  shall  continue  the  direction  taken  by  Geman  and  approximate  the  solu¬ 
tion  process  by  a  deterministic  trajectory  over  an  infinite  time  interval,  using  the  interplay 
between  the  rate  of  fluctuations  of  the  random  process  and  the  rate  of  the  p  mixing,  yield¬ 
ing  a  result  for  a  wide  family  of  nonlinear  random  differential  equations.  We  will  establish 
conditions  under  which  the  random  solution  stays  close  in  L 2  sense  to  the  associated  deter¬ 
ministic  solution.  The  result  is  particularly  useful  when  a  converging  deterministic  equation 
is  approximated  by  a  random  equation  that  is  more  computationally  feasible.  Section  4  is 
devoted  to  such  an  application,  in  the  theory  of  synaptic  modification  in  Neural  Networks. 

Similar  analysis  was  carried  out  on  the  discrete  time  version  of  such  equations,  see  Ljung 
(1978),  Kushner  and  Clark,  (1978),  Dupuis  and  Kushner  (1987),  and  the  references  therein. 


2.  Formulation  and  statement  of  the  problem 


In  this  section  we  briefly  summarize  the  relevant  results  form  Geman  (1977,  1979). 

Let  (f>(t,u )  be  a  bounded  stationary  stochastic  process  with  Pq  and  the  a -fields 
generated  by  (<£(r,u;)  :  0  <  r  <  t},  and  {</>(r,u>)  :  t  <  r  <  oo}  respectively.  Let  the  signed 
measure  be  defined  on  (fi  x  fi,  Pq  x  *2?,)  by 


vt,s  =  P(u>  :  (w,w)  €  S)  -  P  X  P(B ),  for  B  £  Pq  X  P™6. 


For  any  {B  £  Pq  X  the  set  \u;  :  (u>,u>)  £  Bj  is  in  P ',  and  since  it  is  also  a  monotone 

class,  v  is  well  defined.  The  stochastic  process  <ji>(f,w),  is  said  to  have  Type  II  p  mixing  if 


V>{b)  =  sup  sup  |  vtj(A)  |  — •  0. 

<>o  6~°° 

Remark  on  p  mixing:  The  results  we  describe  hold  for  Type  I  mixing 
which  were  introduced  by  Volkonskii  and  Rozanov  (1959),  since  for  both 
we  have  |v|t^(D  xfl)<  2p(6). 
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Let  e  be  a  positive  number,  and  consider  the  system: 

x({t,u)  =  H(xt(t,w),u>,t/e), 

yt(t)  =  Gt(yt(t),t),  (2.1) 

x<(0,u;)  =  ye(0)  =  x0  €  Rn. 

Assume: 

1.  H  is  jointly  measurable  with  respect  to  its  three  arguments. 

2.  Ge(x,t)  =  E[H(x(s,u),t/e)\,  and  for  all  i  and  j 

Q 

— — Gi(x,t)  exists,  and  is  continuous  in  ( x,t ). 

3.  For  some  T  >  0: 

a.  There  exists  a  unique  solution,  x(t, u>),  on  [0,  T]  for  almost  all  u>\  and 

b.  A  solution  to 

Q 

—  </(t,  s,  x)  =  G(g(t , «,  x),  t),  x)  =  x, 

exists  on  [0,  T]  x  [0,T]  x  Rn. 

The  following  notations  will  be  used: 

1.  H({xt(t,u)),u,t)  d=  H(xf(t,u),u,t/c) 

2.  g3{t,s,x)  =  {d/ds)g{t,s,x). 

3.  gx(t,s,x)  =  the  n  x  n  matrix  with  (i,j)  component  (d/dxj)g,(t,s,x). 

4.  For  H(x,u),t)  define  the  families  of  cr-fields  Tq  and  such  that,  for  each  t  >  0,  Eq 
contains  the  tr-field  generated  by 

{/f(x,~,r):0  <  T''t,xC  Rn}, 

and  contains  the  <7- field  generated  by 

{f/(r,u>,r)  :  t  <  r  <  oo,x  £  tfn}. 
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The  relation  between  the  random  differential  equation  and  its  averaged  version  for  system 
(2.1)  under  conditions  (1),  (2),  and  (3)  is  given  by: 

Lemma  (Geman  1977)  For  any  C 1  function  K  :  Rn  — >  Rl  and  t  £  [0,  T): 

-  E[K(x(t))]  =  K(y(t))  +  jf  ‘  j  K(g(t,  s,  *(*,  u,))))  •  H(x(s,  w),  17,  s)dvsfids, 

provided  that 

s,*(s,u;))))  ■  H(x(s,u>),t},s),  and 
(^*'(3(«,s,a:(5,w))))  •  H(x(s,u>),u,s) 
are  absolutely  integrable  on  12  X  fi  X  [0,T],  with  respect  to  dP(u;)dP(r])ds. 

The  proof  of  the  lemma  is  based  on  the  relationship  between  the  initial  conditions  in 
time  and  in  space  for  an  ODE,  namely:  If  g(t,s,x)  is  the  function  satisfying 

—g(i,s,x)  =  G(g(t,s,x),t) 

then 

^5(^,5,^)  —  gx  ( i ,  s,  x  )G( x,  s) 

for  all  t  £  [0,  00),  s  £  [0,  00),  and  x  £  Rn.  This  follows  from  the  observation  that  g(t,s,x) 
is  constant  along  trajectories  of  the  form  (^,  x(s))  (cf.  Hartman,  1964  chap  5). 

Theorem  (Geman,  1977)  Finite  time  averaging.  Assume  also  that: 

4.  There  exist  continuous  functions  Bi(r,t),  B2{r,t),  and  B2(r,t),  such  that  for  all  i,  j,  A’,  r  > 
0,  and  w: 

a.  |  Hi(x.u),  f,r)  |<  Bi(\  z  !>0i 

b.  |  ( d/dxj)Hi(x,u),t,T )  |<  B2{ I  x  |,f); 

c.  |  (d2 /dxjdxk)Hi(x,u>,  t,r)  |<  fl3(|  x  |,<). 

5.  supe>0 ,(g(o,rj  I  2/e(0  |<  B 4  for  some  B4  and  T. 
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Then 


in  probability. 


sup  I  xt{t)  -  y((t)  I  — >0 
te[o,r]  c— o 


3.  Averaging  on  [0,oo) 

When  averaging  on  an  infinite  interval  we  require  that  €  be  a  function  of  t  and  e  \  0, 
meaning  that  the  mixing  rate  becomes  stronger  in  time.  More  specifically,  let  e  be  a  function 
of  the  form  e(t)  =  eo e(f)  where  e  is  monotonically  decreasing  to  zero  in  time. 

The  above  lemma  still  holds  when  x,  H ,  g  and  G  are  replaced  by  xt,  He,  ge  and  Ge 
respectively,  and  also  when  e  becomes  a  function  of  t. 

In  order  for  the  approximation  to  hold  on  [0,  oo)  we  require  that  ,B2,B3  are  constants 
in  condition  4  (this  will  be  relaxed  later)  extend  condition  5  to  hold  for  t  E  [0,  oc),  and  add 
the  following  relation  between  the  rate  of  the  mixing  of  H  and  the  convergence  of  c  to  zero: 

6.  3  7  >  0,  c  >  0,  such  that  <£>(#)  <  and  e(f)  <  t~^  +  l+c\  for  a  monotone  decreasing 
e. 

Theorem  3.1  Assume  Ht  is  of  Type  II  >p  mixing,  and  satisfies  condition  1-6,  then 

lim  sup  E  |  xe(£)  -  yf(t)  |2=  0. 

«o— o  (>0 


Proof:  Assume  first  that  t  is  an  integer.  Fix  eo  ami  apply  the  lemma  to  the  system  using 
K(x)=\x-y((t)\i: 

£K(0-  y«(0  i2  = 

=  I  If  (-^K(gf(t,s,xt(s,uj)))')  •  Ht(xc(s,u),  rt,s)dv,t0ds  ( 

00  *  a 

-Xjl  jk  lJn  re(s.w))))  •  Hc(r({s,u),T],s)dvafids  \ 


wop!2  v2.11 


5 


N.  Intrator 


March  9,  1990 


For  any  fixed  8k  >  0  (to  be  chosen  later),  since  each  integral  is  bounded  we  can  write  VJb: 

Ik  iln  Cl{^iK^t,s,x^s,UJ^)  '  II*(x‘(s'u;)’Tl's)dv’,'>ds  = 

I  =  I  i  In  n(IiiK(9i(t's,Xe(s'u>^)  '  He(xt(s,u;)iT!,3)dv,fids 

II  +  f  f  (^~K{9e{t,s,xc(s  -  4>w)))V 

Jk-i+Si  Jaxo  vc,z  ' 


Ht{xt{s  -  8k,u),T],s)dvsflds 


+  f  [  {(4~K(9e{t,s,xc(s,u))))  •  H({xt(s,u;),T],s) 

Jk-i+fiJnxn  Kc,x  ' 

111  ~  ~  ^fe^))))  '  Hdxds  ~  6k,ui),Ti,s)}dvS'0ds. 


The  bounds  on  xe  and  its  derivatives,  and  the  smoothness  of  K  imply  that  I  is  0(6k).  In 
the  second  term  we  can  replace  vs0  by  s  since  these  measures  agree  on  (0  x  ft,  Tq  ~s  x 
-Ff3),  s  —  and  since  xds  ~  w)  is  ?r6  measurable.  Since  vt j  is  the  difference  of  two 
probability  measures,  the  total  variation  measure  satisfies: 


Mt,«(ft  X  ft)  <  2,  and  |u|t,^(ft  x  ft)  =  2  ^  |vjt,*(A), 

therefore,  with  Type  II  (or  I)  mixing:  |u|t  s(ft  x  ft)  <  2^(<5).  Applying  this  to  the  second 
integral  and  using  the  above  bounds  again  we  get  that  II  is  O  (y(6k!e{k  -  1))).  The  last 
term  is  also  0(6k)  from  the  smoothness  of  Hc  and  of  xe. 

Now  choose  6k  =  ^/(o{k  —  l)~(1+ic\  k  >  1,  then  since  e(k  —  1)  <  eo(£  -  l)“( ^ +  1-c), 
we  get  6k/e(k  —  1)  >  -^=(1:  —  l)^  +  =c.  From  the  condition  on  <p  we  have  v?(^fc/<(^  ~  1))  £ 
eo^(k  ~  l)~ft  +  2'YCi.  Since  7  >  0,  the  sum 

£c>(**)  +  0(vWe(*-i)))  =  0(4(l+7)). 

fc>i 

For  the  segment  of  t  between  two  integers,  an  analogous  argument  is  applied  yielding  an 
extra  term  of  the  form  +  ej  ).  therefore  E  \  x((t)  -  yf{t)  |2;=  0  j  uniformly  in 

t. 
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This  implies  that 

supE  |  xt(t)  -  yt(t)  \2  =  0(e^(1+7)), 
t>o  v  7 

lim  sup  E  |  xf(t )  —  ye(t)  \ 2  —  0. 

«o— o  (>0 

0 

The  following  problem  is  closely  related:  For  fixed  u>,  let  H(x,ui,t)  map  Rn  x  Rm  x  R] 
into  Rn.  Assume  that  for  each  H(x,co,t)  is  a  mining  process,  and  for  each  x  and  t  define 
G{x,t)  =  E[H(x,(jj,t)\.  Consider  the  random  equation 

ii(t,u)  =  eH(xi(t,uj),u>,t),  xi(  0,u)  =  xo,  (3.1) 

with  its  averaged  equation 

yt{t)  =  iG(yi(t),t),  y<(  0)  =  z0-  (3.2) 

For  equation  (3.2)  condition  6  becomes: 

6'.  37  >  0,  such  that 

i)  V(S)  <  6~\ 

ii)  e(f)  =  e0r(t)t_p,  for  p  =  c  >  and  Vf:  0  <  C\  <  r(t)  <  c?. 

Theorem  3.2  Under  the  assumptions  of  theorem  (3.1)  and  (6'); 

lim  sup  E  |  lim  -  yi{t)  |2=  0. 

«o— 0  f>0 

Proof:  Apply  the  change  of  variables:  t  =  ^tj+c,  dt  —  ^(2  +  c)rlTCdr,  to  equation  (3.1): 

=  r~p{2+c)r{T2+c)H(^xl,uj,T2+c/(o^(2  +  c)r1+c 
=  r(r2+c)//{(j:e,u;,r/f(r)), 

for  c(r)  =  £or_^1+c^  Now  observe  that  c  satisfies  condition  (6)  in  theorem  (3.1).  which 
gives  the  desired  result.  0 
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As  can  be  seen  from  the  proof,  p  has  to  satisfy  the  conditions  ^  <  p  <  1,  and  i(t)  has 
to  be  greater  than  <-1  so  that  r(t)  >  Cq  >  0,  which  allows  the  invocation  of  the  previous 
theorem.  It  follows  that  if  e(t)  =  <_1,  a  convergence  is  assured  for  any  Type  II  mixing. 
Obviously,  p  may  be  larger  than  1  since  i  may  be  split  into  two  functions,  one  bounded 
and  the  other  satisfying  the  conditions  of  the  theorem.  The  same  argument  holds  for  r(/), 
however,  it  is  clear  that  one  would  like  e  to  go  as  slow  as  possible  to  zero,  since  then  if 
the  averaged  version  has  a  limit,  the  convergence  rate  of  both  equations  to  that  limit  is 
inversely  proportional  to  p. 

It  is  possible  to  °xtend  the  theory  to  the  cases  where  the  partied  derivatives  of  H  have  a 
polynomial  growth  it  time.  Then  e  has  to  decrease  faster  so  that  the  above  integrals  may 
still  be  controlled.  We  get  the  following  theorem: 

Theorem  3.3  Assume  that  Bx,  B 2,  #3,  and  B4  are  bounded  by  ta  for  some  a  >  0  in 
condition  4  of  theorem  3.1,  and  replace  condition  6  with  the  following: 

6.  3  7  >  0,  c  >  i,  such  that  ip(6)  <  6 -7,  and  i(t)  <  <-(1+c+3q),  for  a  monotone  decreasing 
€.  Then 

lirc  supi?  j  x€(t)  -  y({t)  |2=  0. 

«o  — o  t>0 

Proof:  When  applying  the  lemma  as  before  we  get  the  following: 

/  =  X;o(^)(*-i)2q 

k 

k 

III  =  £o(*fc)*3a. 

k 

Now  chose  5 *  =  y/e0 (k  —  2<c~  i  '  +  3° > ;  then  since  i(t)  <  /_(l+c+3o,\  we  get  just  as 

before  6k /({k  —  1)  >  ~^(k  —  The  rest  of  the  proof  follows  exactly  as  before.  0 
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Extending  theorem  3.2  to  the  case  where  the  partial  spatial  derivatives  are  bounded  by 
a  polynomial  in  t  is  done  by  absorbing  the  growth  of  H  into  c,  which  gives  the  following 
corollary: 

Corollary  3.4  Assume  that  B\,  B2,  B3  and  B4  are  bounded  by  ta  for  some  a  >  0  in 
condition  4  of  theorem  3.1,  and  replace  condition  6  in  theorem  3.2  with  the  following: 

6'.  37  >  0,  such  that 

i)  <p(6)  < 

ii)  e(t)  =  €0 r(t)t~(a+p\  for  p  =  c  >  i,  and  Vt:  0  <  c\  <  r(t)  <  C2.  Then 

lim  sup  E  |  x*(f)  -  yi(t)  |2=  0. 

«o-o  t>0 

An  important  observation  has  to  be  made  here:  If  the  deterministic  version  represents 
a  converging  trajectory,  e.g.,  if  the  equation  represents  a  gradient  descent,  then  as  long  as 
l{t)  >  t_1,  the  deterministic  version  will  still  converge  to  a  true  local  minimum,  however 
if  i(t)  <  r1,  then  /0°°  e(r)  <  00,  and  so  the  convergence  of  the  deterministic  equation  is 
not  assured,  which  implies  that  the  convergence  of  the  stochastic  version  to  a  true  local 
minimum  is  not  granted. 

4.  An  application  to  the  synaptic  modification  equations  of  a  BCM  neuron 

In  this  section,  we  apply  the  theorem  to  a  random  differential  equation  representing  the 
low  governing  synaptic  weight  modification  in  the  BCM  theory  for  learning  and  memory 
in  neurons,  Bienenstock  et  al.  (1982).  We  start  with  a  short  review  on  the  notations  and 
definitions  of  BCM  theory,  a  more  thorough  review  can  be  found  in  Intrator  (1990),  and 
the  references  therein. 

Consider  a  neuron  whose  input  is  the  vector  x  =  (xi,...,xn)>  has  a  synaptic -weight 
vector  m  =  (mi , . . . ,  m^r),  both  in  /?jV,  and  activity  (in  the  linear  region)  c  =  x  ■  m.  The 
input  x  is  assumed  to  be  a  stochastic  process  of  Type  II  y?  mixing,  bounded,  and  piecewise 
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constant.  Let  0m  =  £[(i  •  m)!],  0(c,  0m)  =  c2  —  |c0m.  c  represents  the  linear  projection 
of  z  onto  m,  and  we  seek  an  optimal  projection  in  some  sense. 

The  BCM  synaptic  modification  equations  are  given  by: 

m  =  fi(t)<j>(x  •  m,  Qm)z,  m(0)  =  m0,  (4.1) 

their  averaged  version  is  given  by: 

fh  =  fi(t)E  |<&(x  •  m,  Oajxj,  m(0)  =  m0.  (4.2) 

fi(t)  is  a  global  modulator  which  is  assumed  to  take  into  account  all  the  global  factors 
affecting  the  cell,  e.g.,  the  beginning  or  end  of  the  critical  period,  or  state  of  arousal  (Bear 
and  Cooper,  1988). 

Equation  (4.2)  is  shown  to  he  a  dimensionality  reduction  method  based  on  a  cost  function 
that  favors  directions  m  for  which  the  distribution  of  the  inputs  is  different  from  normal  by 
means  of  skewness  (Intrator,  1990). 

Our  aim  is  to  show  the  convergence  of  the  stochastic  differential  equation.  This  will  be 
done  in  two  step;  First  we  show  that  the  averaged  deterministic  equation  converges,  and 
then  we  use  theorem  3.2  to  show  the  convergence  of  the  random  differential  equation  to  its 
averaged  deterministic  equation. 

The  convergence  of  the  deterministic  equation 

Without  loss  of  generality,  we  may  assume  that  the  random  process  z  is  in  the  unit  ball 
in  RN ,  and  Var(.c  •  m)  >  A||  m  ||2  >  0,  which  simply  says  that  z  does  not  lie  in  a  subspace 
or  a  manifold  of  RN .  Since  we  are  interested  in  dimensionality  reduction,  we  can  always 
reduce  a-priori  the  dimensionality  of  z  so  that  it  will  span  Ry  for  some  .V.  When  the  theory 
is  applied  to  a  finite  value  random  vector,  we  can  restrict  rn  to  be  in  the  span 

of  ^i ,  • .  • ,  zn. 
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When  we  multiply  both  sides  of  the  above  equation  by  m^,  assuming  none  of  its  compo¬ 
nents  is  zero,  we  get: 

\  II  ^  II  =  E[{x  ■  mM)3]  -  ~  E2[(x  ■  mM)2] 

<  II  II3  -  ^Var2(x  -mj 

<11  II3-^a2||  m.  ||4 

=  11  ^  ||3{1  -  |a2||  ||}, 

which  implies  that  ||  mM  ||  <  0 

Using  this  fact  we  can  now  show  the  convergence  of  m^.  Wre  observe  that  =  —  VfZ, 
where  =  -^{E[(x  -  mp)3]  -  E2[(x  ■  m^)2]}  is  the  risk.  R  is  bounded  from  below 

since  ||  |[  is  bounded,  therefore  fh^  converges  to  a  local  minimum  of  R.  <0 

The  convergence  of  the  stochastic  equation 

Claim  Under  the  above  conditions  rn^(t)  converges  in  L 2  to  a  local  minimum  of  the  risk. 

Proof:  The  calculation  above  implies  that  m M  is  bounded  for  (almost)  every  /i. 

In  our  case  By,  B3,  B3  and  B4  are  independent  of  t  or  mp,  therefore,  if  we  replace  t(t) 
by  fi(t)  and  apply  theorem  3.2,  we  get 

supBjm^f)  -  rh^(t)|2  — •  0. 

t>  0  Mo— 0 

the  solution  to  the  deterministic  equation  will  converge  to  the  same  local  minimum 
y ,  V/i  if  no  <  C ,  for  some  positive  constant  C.  therefore  we  can  choose  T  for  which 
|m^(t)  -  y |  <  |,  Vo  <  C,  t  >  T,  then  for  t  >  T  we  have: 

lmn(0  ~y\<  K«(0  -  m„(OI  +  |”v(0  -  y  |<  \m„(t)  -  m„(0 |  + 

8  8 

=>  sup  E\m^(t)  -  i)\<  sup  E\m^{t )  -  in  (?)j  r  -  — * 
t  <>r  t'>T  1  tl0~°  1 
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6  is  arbitrary,  which  implies  that 


Elm^t)  —  y  \  +  0 

n—o 


0 


5.  Summary 

It  has  been  shown  that  under  mild  conditions,  the  equations  x(  —  eH(x,u;,t),  and  yt  = 
(G(y,  t)  where  C?(x,t)  =  E[H(x,u>,  t)j,  have  close  trajectories  in  the  infinite  interval  when 
e(t)  <  t  2 .  The  result  may  be  computationally  useful,  and  as  has  been  shown  in  the 
example,  may  assist  in  the  analysis  of  the  random  differential  equation. 
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